AITopics | capacity factor

Collaborating Authors

capacity factor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

on Fine tuning with a Dense Model

Neural Information Processing SystemsApr-25-2026, 07:44:42 GMT

Our 8BMoE model achieves stronger pre-training perplexity than its dense counterpart. However, a better perplexity does not always directly translate to downstream performance as demonstrated in Section 4.4. To this end, we compare fine-tuning performance of the 8B dense model and MoE model in Table 1. As shown in the table, our MoE model using expert choice routing consistently outperforms the dense model across the 11 tasks in GLUE and SuperGLUE. We evaluate the downstream task fine-tuning performance by varying the capacity factors.

artificial intelligence, expert choice, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

2f00ecd787b432c1d36f3de9800728eb-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 03:46:15 GMT

expert choice, expert choice method, perplexity, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

RAVR: Reference-Answer-guided Variational Reasoning for Large Language Models

Lin, Tianqianjin, Zhao, Xi, Zhang, Xingyao, Long, Rujiao, Xu, Yi, Jiang, Zhuoren, Su, Wenbo, Zheng, Bo

arXiv.org Artificial IntelligenceOct-30-2025

Reinforcement learning (RL) can refine the reasoning abilities of large language models (LLMs), but critically depends on a key prerequisite: the LLM can already generate high-utility reasoning paths with non-negligible probability. For tasks beyond the LLM's current competence, such reasoning path can be hard to sample, and learning risks reinforcing familiar but suboptimal reasoning. We are motivated by the insight from cognitive science that Why is this the answer is often an easier question than What is the answer, as it avoids the heavy cognitive load of open-ended exploration, opting instead for explanatory reconstruction-systematically retracing the reasoning that links a question to its answer. We show that LLMs can similarly leverage answers to derive high-quality reasoning paths. We formalize this phenomenon and prove that conditioning on answer provably increases the expected utility of sampled reasoning paths, thereby transforming intractable problems into learnable ones. Building on this insight, we introduce RAVR (Reference-Answer-guided Variational Reasoning), an end-to-end framework that uses answer-conditioned reasoning as a variational surrogate for question-only reasoning. Experiments in both general and math domains demonstrate consistent improvements over strong baselines. We further analyze the reasoning behavior and find that RAVR reduces hesitation, strengthens conclusion consolidation, and promotes problem-specific strategies in reasoning.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.25206

Genre: Research Report (0.64)

Industry:

Energy > Renewable > Solar (1.00)
Energy > Renewable > Geothermal (1.00)
Energy > Power Industry (1.00)
Energy > Renewable > Hydroelectric (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Assessing the risk of future Dunkelflaute events for Germany using generative deep learning

Strnad, Felix, Schmidt, Jonathan, Mockert, Fabian, Hennig, Philipp, Ludwig, Nicole

arXiv.org Artificial IntelligenceSep-30-2025

The European electricity power grid is transitioning towards renewable energy sources, characterized by an increasing share of off- and onshore wind and solar power. However, the weather dependency of these energy sources poses a challenge to grid stability, with so-called Dunkelflaute events -- periods of low wind and solar power generation -- being of particular concern due to their potential to cause electricity supply shortages. In this study, we investigate the impact of these events on the German electricity production in the years and decades to come. For this purpose, we adapt a recently developed generative deep learning framework to downscale climate simulations from the CMIP6 ensemble. We first compare their statistics to the historical record taken from ERA5 data. Next, we use these downscaled simulations to assess plausible future occurrences of Dunkelflaute events in Germany under the optimistic low (SSP2-4.5) and high (SSP5-8.5) emission scenarios. Our analysis indicates that both the frequency and duration of Dunkelflaute events in Germany in the ensemble mean are projected to remain largely unchanged compared to the historical period. This suggests that, under the considered climate scenarios, the associated risk is expected to remain stable throughout the century.

artificial intelligence, germany, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.24788

Country: Europe > Germany > Baden-Württemberg (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Energy > Renewable > Wind (1.00)
Energy > Renewable > Solar (1.00)
Energy > Power Industry (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts

He, Shwai, Cai, Weilin, Huang, Jiayi, Li, Ang

arXiv.org Artificial IntelligenceMar-6-2025

The Mixture of Experts (MoE) is an effective architecture for scaling large language models by leveraging sparse expert activation, optimizing the trade-off between performance and efficiency. However, under expert parallelism, MoE suffers from inference inefficiencies due to imbalanced token-to-expert assignment, where some experts are overloaded while others remain underutilized. This imbalance leads to poor resource utilization and increased latency, as the most burdened expert dictates the overall delay, a phenomenon we define as the \textbf{\textit{Straggler Effect}}. To mitigate this, we propose Capacity-Aware Inference, including two key techniques: (1) \textbf{\textit{Capacity-Aware Token Drop}}, which discards overloaded tokens to regulate the maximum latency of MoE, and (2) \textbf{\textit{Capacity-Aware Token Reroute}}, which reallocates overflowed tokens to underutilized experts, balancing the token distribution. These techniques collectively optimize both high-load and low-load expert utilization, leading to a more efficient MoE inference pipeline. Extensive experiments demonstrate the effectiveness of our methods, showing significant improvements in inference efficiency, e.g., 0.2\% average performance increase and a 1.94$\times$ inference speedup on Mixtral-8$\times$7B-Instruct.

arxiv, wang, zhang, (16 more...)

arXiv.org Artificial Intelligence

2503.05066

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Llama 3 Meets MoE: Efficient Upcycling

Vavre, Aditya, He, Ethan, Liu, Dennis, Yan, Zijie, Yang, June, Tajbakhsh, Nima, Aithal, Ashwath

arXiv.org Artificial IntelligenceDec-13-2024

Scaling large language models (LLMs) significantly improves performance but comes with prohibitive computational costs. Mixture-of-Experts (MoE) models offer an efficient alternative, increasing capacity without a proportional rise in compute requirements. However, training MoE models from scratch poses challenges like overfitting and routing instability. We present an efficient training recipe leveraging pre-trained dense checkpoints, training an 8-Expert Top-2 MoE model from Llama 3-8B with less than $1\%$ of typical pre-training compute. Our approach enhances downstream performance on academic benchmarks, achieving a $\textbf{2%}$ improvement in 0-shot accuracy on MMLU, while reaching a Model FLOPs Utilization (MFU) of $\textbf{46.8%}$ during training using our framework. We also integrate online upcycling in NeMo for seamless use of pre-trained weights, enabling cost-effective development of high-capacity MoE models.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.09952

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.05)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Texas (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Informed along the road: roadway capacity driven graph convolution network for network-wide traffic prediction

Bian, Zilin, Gao, Jingqin, Ozbay, Kaan, Zuo, Fan, Zuo, Dachuan, Li, Zhenning

arXiv.org Artificial IntelligenceJun-18-2024

While deep learning has shown success in predicting traffic states, most methods treat it as a general prediction task without considering transportation aspects. Recently, graph neural networks have proven effective for this task, but few incorporate external factors that impact roadway capacity and traffic flow. This study introduces the Roadway Capacity Driven Graph Convolution Network (RCDGCN) model, which incorporates static and dynamic roadway capacity attributes in spatio-temporal settings to predict network-wide traffic states. The model was evaluated on two real-world datasets with different transportation factors: the ICM-495 highway network and an urban network in Manhattan, New York City. Results show RCDGCN outperformed baseline methods in forecasting accuracy. Analyses, including ablation experiments, weight analysis, and case studies, investigated the effect of capacity-related factors. The study demonstrates the potential of using RCDGCN for transportation system management.

capacity factor, prediction, roadway capacity, (13 more...)

arXiv.org Artificial Intelligence

2406.13057

Country:

North America > United States > New York > New York County > Manhattan (0.24)
Asia > Macao (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report > New Finding (0.35)

Industry:

Transportation > Infrastructure & Services (0.94)
Transportation > Ground > Road (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Turn Waste into Worth: Rectifying Top-$k$ Router of MoE

Zeng, Zhiyuan, Guo, Qipeng, Fei, Zhaoye, Yin, Zhangyue, Zhou, Yunhua, Li, Linyang, Sun, Tianxiang, Yan, Hang, Lin, Dahua, Qiu, Xipeng

arXiv.org Artificial IntelligenceFeb-21-2024

Sparse Mixture of Experts (MoE) models are popular for training large language models due to their computational efficiency. However, the commonly used top-$k$ routing mechanism suffers from redundancy computation and memory costs due to the unbalanced routing. Some experts are overflow, where the exceeding tokens are dropped. While some experts are vacant, which are padded with zeros, negatively impacting model performance. To address the dropped tokens and padding, we propose the Rectify-Router, comprising the Intra-GPU Rectification and the Fill-in Rectification. The Intra-GPU Rectification handles dropped tokens, efficiently routing them to experts within the GPU where they are located to avoid inter-GPU communication. The Fill-in Rectification addresses padding by replacing padding tokens with the tokens that have high routing scores. Our experimental results demonstrate that the Intra-GPU Rectification and the Fill-in Rectification effectively handle dropped tokens and padding, respectively. Furthermore, the combination of them achieves superior performance, surpassing the accuracy of the vanilla top-1 router by 4.7%.

fill-in rectification, intra-gpu rectification, rectification, (17 more...)

arXiv.org Artificial Intelligence

2402.12399

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Africa > Rwanda > Kigali > Kigali (0.04)
(9 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Communications > Networks (0.90)

Add feedback

MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

Gale, Trevor, Narayanan, Deepak, Young, Cliff, Zaharia, Matei

arXiv.org Artificial IntelligenceNov-28-2022

Our system is motivated by the limitations of current frameworks, which restrict the dynamic routing in MoE layers to satisfy the constraints of existing software and hardware. These formulations force a tradeoff between model quality and hardware efficiency, as users must choose between dropping tokens from the computation or wasting computation and memory on padding. To address these limitations, we reformulate MoE computation in terms of block-sparse operations and develop new block-sparse GPU kernels that efficiently handle the dynamism present in MoEs. Our approach never drops tokens and maps efficiently to modern hardware, enabling end-to-end training speedups of up to 40% over MoEs trained with the state-of-the-art Tutel library and 2.4 over DNNs trained with the highly-optimized Megatron-LM framework. The past decade has seen significant progress in are fundamental to these architectures. However, existing algorithms and high-performance software to make sparsity hardware and software for deep learning make it difficult practically useful (Gray et al., 2017; Narang et al., 2017; to meet this challenge. For example, TPUs and their XLA Kalchbrenner et al., 2018; Elsen et al., 2020; Gale et al., compiler require all tensor shapes to be known statically 2020).

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2211.15841

Country: